{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hacker News aitextgen\n",
    "\n",
    "A demo on how aitextgen can be used to create bespoke Hacker News submission titles.\n",
    "\n",
    "**NOTE**: This is released as a proof of concept for mini-GPT-2 models; quality of titles may vary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aitextgen import aitextgen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Hacker News Model\n",
    "\n",
    "The `minimaxir/hacker-news` model was finetuned on HN submissions up until May 12th with atleast 5 points.\n",
    "\n",
    "It uses a custom GPT-2 architecture that is only 30 MB on disk (compared to 124M GPT-2's 500MB on disk.)\n",
    "\n",
    "Running the cell will download the model and cache it into `/aitextgen`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen:Loading minimaxir/hacker-news model from /aitextgen.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "219f11651d264ea89c53d26e51fc1a2e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=539.0, style=ProgressStyle(description_…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9692a8e7fa934ee9b5b0cbe12c95b4a4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=30458628.0, style=ProgressStyle(descrip…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen:Using the tokenizer for minimaxir/hacker-news.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bc616f81fd264615a7a32e2776ea29b5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=74927.0, style=ProgressStyle(descriptio…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "41387b5113ea4e45bee4fc5a790bbf61",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=35091.0, style=ProgressStyle(descriptio…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2427ce1184754b8088601d2a0296411a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=120.0, style=ProgressStyle(description_…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "06052496d23e4c7b8e2c0722ff65b9de",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2.0, style=ProgressStyle(description_wi…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "ai = aitextgen(model=\"minimaxir/hacker-news\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generation\n",
    "\n",
    "Since the model is so small, generation happens almost immediately, even in bulk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Kinect can now centralize cellphone locations, not their pictures\n"
     ]
    }
   ],
   "source": [
    "ai.generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Ask HN: Should I start writing a blog post in Python?\n",
      "==========\n",
      "The Psychology of Human Misjudgment (2012)\n",
      "==========\n",
      "New York' New Year: $99 Linux PC\n",
      "==========\n",
      "C++11/12 Released\n",
      "==========\n",
      "Dynamic types in Go\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prompted Input\n",
    "\n",
    "You can seed input with a `prompt` to get specific types of HN posts. The prompt will be **bolded** in the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mAsk HN\u001b[0m: What are some good (O'Reilly eval) books for a new web-based project?\n",
      "==========\n",
      "\u001b[1mAsk HN\u001b[0m: How to avoid the Huawei of 20k job candidates?\n",
      "==========\n",
      "\u001b[1mAsk HN\u001b[0m: How to grow your startup\n",
      "==========\n",
      "\u001b[1mAsk HN\u001b[0m: What's the best way to learn a new languages on your website?\n",
      "==========\n",
      "\u001b[1mAsk HN\u001b[0m: How to get started in Machine Learning?\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"Ask HN\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mShow HN\u001b[0m: The Penetration Tester\n",
      "==========\n",
      "\u001b[1mShow HN\u001b[0m: qVD.S.Next Windows Awesomeness\n",
      "==========\n",
      "\u001b[1mShow HN\u001b[0m: My Startup – a crowdfunded satellite news aggregator\n",
      "==========\n",
      "\u001b[1mShow HN\u001b[0m: The JavaScript Way to Learn JavaScript Within the Web\n",
      "==========\n",
      "\u001b[1mShow HN\u001b[0m: Hacker News like / you read the message\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"Show HN\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mElon Musk\u001b[0m Says Tesla Is a Wireless Carrier Has Been Laying Off\n",
      "==========\n",
      "\u001b[1mElon Musk\u001b[0m’s Family Secretary of Munich Is the New Model 3\n",
      "==========\n",
      "\u001b[1mElon Musk\u001b[0m is a suitable person to learn the originally good\n",
      "==========\n",
      "\u001b[1mElon Musk\u001b[0m's Hyperloop Is a Success\n",
      "==========\n",
      "\u001b[1mElon Musk\u001b[0m’s New Nexus Program\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"Elon Musk\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mGoogle says\u001b[0m its employees are working with Amazon and Apple\n",
      "==========\n",
      "\u001b[1mGoogle says\u001b[0m it’s peaked\n",
      "==========\n",
      "\u001b[1mGoogle says\u001b[0m it is flea banning visible to people who worked in U.S.\n",
      "==========\n",
      "\u001b[1mGoogle says\u001b[0m it will not allow enemy mine to secure sensitive information\n",
      "==========\n",
      "\u001b[1mGoogle says\u001b[0m no to Google for Java\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"Google says\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bulk Generation to File\n",
    "\n",
    "You can use `generate_to_file()` to create many HN titles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen:Generating 1,000 texts to ATG_20200517_235441_14821584.txt\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "48e73d0062ea487d8402af3acc06bddc",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=1000.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "ai.generate_to_file(1000, batch_size=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MIT License\n",
    "\n",
    "Copyright (c) 2020 Max Woolf\n",
    "\n",
    "Permission is hereby granted, free of charge, to any person obtaining a copy\n",
    "of this software and associated documentation files (the \"Software\"), to deal\n",
    "in the Software without restriction, including without limitation the rights\n",
    "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
    "copies of the Software, and to permit persons to whom the Software is\n",
    "furnished to do so, subject to the following conditions:\n",
    "\n",
    "The above copyright notice and this permission notice shall be included in all\n",
    "copies or substantial portions of the Software.\n",
    "\n",
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
    "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
    "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
    "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
    "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
    "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n",
    "SOFTWARE."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.5 64-bit",
   "language": "python",
   "name": "python37564bitb9ff4e3157b244a896f88d1e5f3eb324"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}